[vllm] [cpu] [sagemaker] Add vLLM CPU inference image for SageMaker by timelfrink · Pull Request #5670 · aws/deep-learning-containers

timelfrink · 2026-02-13T07:54:15Z

GitHub Issue #, if available: N/A

Description

Add vLLM CPU-only inference image for SageMaker. Enables running vLLM on CPU instances for reranking, scoring, embeddings, and small generative models.

vllm/x86_64/cpu/Dockerfile.cpu — Multi-stage build from ubuntu:22.04, compiles vLLM v0.15.1 with VLLM_TARGET_DEVICE=cpu. Uses tcmalloc + Intel OpenMP, Python 3.12 via uv, reuses shared sagemaker_entrypoint.sh.
vllm/buildspec-cpu-sm.yml — Buildspec for CPU SageMaker target. Tag: 0.15.1-cpu-py312-ubuntu22.04-sagemaker.

Manual testing on EC2 (c5.4xlarge):

Image builds successfully (~3.5 GB)
/health, /ping return 200
/v1/completions works (facebook/opt-125m)
/score and /invocations work with reranker (Alibaba-NLP/gte-multilingual-reranker-base)

Tests Run

/buildspec vllm/buildspec-cpu-sm.yml
/tests sanity security

Formatting

N/A — No Python files in this PR (Dockerfile + YAML only)

PR Checklist

I've prepended PR tag with frameworks/job this applies to : [vllm] | [cpu] | [sagemaker]
This PR is fully backward compatible with pre-existing code
I've documented the DLC image/dockerfile this relates to
I've documented the tests I've run on the DLC image
I've reviewed the licenses of new binaries and dependencies

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Add support for vLLM CPU inference on SageMaker, aligned with official vLLM CPU Dockerfile patterns. Features: - Multi-stage build: base → vllm-build → vllm-cpu → sagemaker - Uses uv package manager for fast dependency installation - Python 3.12 via uv (not limited to system python) - Build caching with --mount=type=cache for apt, uv, ccache - Wheel-based install (build wheel, then install separately) - Uses official vLLM requirements files (cpu.txt, cpu-build.txt) - Intel OpenMP + tcmalloc for x86_64 CPU performance - gcc-12 as explicit compiler version New files: - vllm/x86_64/cpu/Dockerfile.cpu: Multi-stage Dockerfile - vllm/buildspec-cpu-sm.yml: Build configuration for SageMaker Expected image tag: vllm:0.15.1-cpu-py312-ubuntu22.04-sagemaker Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Set image_size_baseline to 5000 (actual image ~3.5GB) - Add ulimit -c 0 to disable core dumps (matches upstream)

timelfrink requested a review from a team as a code owner February 13, 2026 07:54

aws-deep-learning-containers-ci bot added the unauthorized label Feb 13, 2026

timelfrink and others added 2 commits February 13, 2026 08:57

[vllm] [cpu] [sagemaker] Fix image size baseline and add ulimit

eb25cb6

- Set image_size_baseline to 5000 (actual image ~3.5GB) - Add ulimit -c 0 to disable core dumps (matches upstream)

timelfrink force-pushed the feature/vllm-cpu-sagemaker branch from 7523dee to eb25cb6 Compare February 13, 2026 07:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[vllm] [cpu] [sagemaker] Add vLLM CPU inference image for SageMaker#5670

[vllm] [cpu] [sagemaker] Add vLLM CPU inference image for SageMaker#5670
timelfrink wants to merge 2 commits intoaws:masterfrom
timelfrink:feature/vllm-cpu-sagemaker

timelfrink commented Feb 13, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

timelfrink commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests Run

Formatting

PR Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

timelfrink commented Feb 13, 2026 •

edited

Loading